77 research outputs found

    Terminology extraction: an analysis of linguistic and statistical approaches

    Get PDF
    Are linguistic properties and behaviors important to recognize terms? Are statistical measures effective to extract terms? Is it possible to capture a sort of termhood with computation linguistic techniques? Or maybe, terms are too much sensitive to exogenous and pragmatic factors that cannot be confined in computational linguistic? All these questions are still open. This study tries to contribute in the search of an answer, with the belief that it can be found only through a careful experimental analysis of real case studies and a study of their correlation with theoretical insights

    Generic ontology learners on application domains

    Get PDF

    Flames recognition for opinion mining

    Get PDF
    The emerging world-wide e-society creates new ways of interaction between people with different cultures and backgrounds. Communication systems as forums, blogs, and comments are easily accessible to end users. In this context, user generated content management revealed to be a difficult but necessary task. Studying and interpreting user generated data/text available on the Internet is a complex and time consuming task for any human analyst. This study proposes an interdisciplinary approach to modelling the flaming phenomena (hot, aggressive discussions) in online Italian forums. The model is based on the analysis of psycho/cognitive/linguistic interaction modalities among web communities' participants, state-of-the art machine learning techniques and natural language processing technology. Virtual communities' administrators, moderators and users could benefit directly from this research. A further positive outcome of this research is the opportunity to better understand and model the dynamics of web forums as the base for developing opinion mining applications focused on commercial applications

    Bridging the demand and the offer in data science

    Get PDF
    During the last several years, we have observed an exponential increase in the demand for Data Scientists in the job market. As a result, a number of trainings, courses, books, and university educational programs (both at undergraduate, graduate and postgraduate levels) have been labeled as “Big data” or “Data Science”; the fil‐rouge of each of them is the aim at forming people with the right competencies and skills to satisfy the business sector needs. In this paper, we report on some of the exercises done in analyzing current Data Science education offer and matching with the needs of the job markets to propose a scalable matching service, ie, COmpetencies ClassificatiOn (E‐CO‐2), based on Data Science techniques. The E‐CO‐2 service can help to extract relevant information from Data Science–related documents (course descriptions, job Ads, blogs, or papers), which enable the comparison of the demand and offer in the field of Data Science Education and HR management, ultimately helping to establish the profession of Data Scientist.publishedVersio

    Creating a medical dictionary using word alignment: The influence of sources and resources

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, ICF, NCSP, KSH97-P and parts of MeSH and how the terminology systems and type of resources influence the quality.</p> <p>Methods</p> <p>We automatically word aligned the terminology systems using static resources, like dictionaries, statistical resources, like statistically derived dictionaries, and training resources, which were generated from manual word alignment. We varied which part of the terminology systems that we used to generate the resources, which parts that we word aligned and which types of resources we used in the alignment process to explore the influence the different terminology systems and resources have on the recall and precision. After the analysis, we used the best configuration of the automatic word alignment for generation of candidate term pairs. We then manually verified the candidate term pairs and included the correct pairs in an English-Swedish dictionary.</p> <p>Results</p> <p>The results indicate that more resources and resource types give better results but the size of the parts used to generate the resources only partly affects the quality. The most generally useful resources were generated from ICD-10 and resources generated from MeSH were not as general as other resources. Systematic inter-language differences in the structure of the terminology system rubrics make the rubrics harder to align. Manually created training resources give nearly as good results as a union of static resources, statistical resources and training resources and noticeably better results than a union of static resources and statistical resources. The verified English-Swedish dictionary contains 24,000 term pairs in base forms.</p> <p>Conclusion</p> <p>More resources give better results in the automatic word alignment, but some resources only give small improvements. The most important type of resource is training and the most general resources were generated from ICD-10.</p

    CODHIR - AN INFORMATION-RETRIEVAL SYSTEM BASED ON SEMANTIC DOCUMENT REPRESENTATION

    No full text
    An information retrieval (IR) system, implemented as a part of a content-driven hypertextual information retrieval (CoDHIR) project, is described. This work focuses on the use of semantic information that can be automatically acquired by applying natural language processing (NLP) techniques to texts. The information is represented using conceptual graphs. The problem of synonyms and homonyms is addressed in our system by using a model based on the interpretation of conceptual graphs extracted from texts. The detection of contextual roles of words allows an improvement in retrieval precision over traditional IR technologies. Ranking of documents, based on document relevance, is obtained by extending the vector space model into an oblique space and taking into account the relevance among different word couples

    Semi-automatic ontology development: processes and resources

    No full text
    The exploitation of theoretical results in knowledge representation, language standardization by W3C and data publication initiatives such as Linked Open Data have given a level of concreteness to the field of ontology research. In light of these recent outcomes, ontology development has also found its way to the forefront, benefiting from years of R&D on development tools. Semi-Automatic Ontology Development: Processes and Resources includes state-of-the-art research results aimed at the automation of ontology development processes and the reuse of external resources becoming a reality, thus being of interest for a wide and diversified community of users. This book provides a thorough overview on the current efforts on this subject and suggests common directions for interested researchers and practitioner

    An environment for semi-automatic annotation of ontological knowledge with linguistic content.

    No full text
    Both the multilingual aspects which characterize the (Semantic) Web and the demand for more easy-to-share forms of knowledge representation, being equally accessible by humans and machines, push the need for a more "linguistically aware" approach to ontology development. Ontologies should thus express knowledge by associating formal content with explicative linguistic expressions, possibly in different languages. By adopting such an approach, the intended meaning of concepts and roles becomes more clearly expressed for humans, thus facilitating (among others) reuse of existing knowledge, while automatic content mediation between autonomous information sources gets far more chances than otherwise. In past work we introduced OntoLing [7], a Protégé plug-in offering a modular and scalable framework for performing manual annotation of ontological data with information from different, heterogeneous linguistic resources. We present now an improved version of OntoLing, which supports the user with automatic suggestions for enriching ontologies with linguistic content. Different specific linguistic enrichment problems are discussed and we show how they have been tackled considering both algorithmic aspects and profiling of user interaction inside the OntoLing framework. © Springer-Verlag Berlin Heidelberg 2006
    corecore